Improving Translation Selection with a New Translation Model Trained by Independent Monolingual Corpora

نویسندگان

  • Ming Zhou
  • Ding Yuan
  • Changning Huang
چکیده

We propose a novel statistical translation model to improve translation selection of collocation. In the statistical approach that has been popularly applied for translation selection, bilingual corpora are used to train the translation model. However, there exists a formidable bottleneck in acquiring large-scale bilingual corpora, in particular for language pairs involving Chinese. In this paper, we propose a new approach to training the translation model by using unrelated monolingual corpora. First, a Chinese corpus and an English corpus are parsed with dependency parsers, respectively, and two dependency triple databases are generated. Then, the similarity between a Chinese word and an English word can be estimated using the two monolingual dependency triple databases with the help of a simple Chinese-English dictionary. This cross-language word similarity is used to simulate the word translation probability. Finally, the generated translation model is used together with the language model trained with the English dependency database to realize translation of Chinese collocations into English. To demonstrate the effectiveness of this method, we performed various experiments with verb-object collocation translation. The experiments produced very promising results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Translation Lexicon Induction from Monolingual Corpora via Dependency Contexts and Part-of-Speech Equivalences

This paper presents novel improvements to the induction of translation lexicons from monolingual corpora using multilingual dependency parses. We introduce a dependency-based context model that incorporates long-range dependencies, variable context sizes, and reordering. It provides a 16% relative improvement over the baseline approach that uses a fixed context window of adjacent words. Its Top...

متن کامل

Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information

To adapt a translation model trained from the data in one domain to another, previous works paid more attention to the studies of parallel corpus while ignoring the in-domain monolingual corpora which can be obtained more easily. In this paper, we propose a novel approach for translation model adaptation by utilizing in-domain monolingual topic information instead of the in-domain bilingual cor...

متن کامل

Estimating Word Translation Probabilities from Unrelated Monolingual Corpora Using the EM Algorithm

Selecting the right word translation among several op tions in the lexicon is a core problem for machine trans lation We present a novel approach to this problem that can be trained using only unrelated monolingual corpora and a lexicon By estimating word translation probabilities using the EM algorithm we extend upon target language modeling We construct a word trans lation model for German an...

متن کامل

Collocation Translation Acquisition Using Monolingual Corpora

Collocation translation is important for machine translation and many other NLP tasks. Unlike previous methods using bilingual parallel corpora, this paper presents a new method for acquiring collocation translations by making use of monolingual corpora and linguistic knowledge. First, dependency triples are extracted from Chinese and English corpora with dependency parsers. Then, a dependency ...

متن کامل

Domain Adaptation for Statistical Machine Translation with Domain Dictionary and Monolingual Corpora

tra Statistical machine translation systems are usually trained on large amounts of bilingual text and monolingual text. In this paper, we propose a method to perform domain adaptation for statistical machine translation, where in-domain bilingual corpora do not exist. This method first uses out-of-domain corpora to train a baseline system and then uses in-domain translation dictionaries and in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJCLCLP

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2001